We have seen how to get the thread ID for threads in a single block; just by using index=threadIdx.x;
which is sufficient for single block.
But that index is not unique withing the grid if grid has multiple blocks.
For example:
blockIdx.x=0
threadIdx.x=0
threadIdx.x=1
threadIdx.x=2
threadIdx.x=3
blockIdx.x=1
threadIdx.x=0
threadIdx.x=1
threadIdx.x=2
threadIdx.x=3
blockIdx.x=2
threadIdx.x=0
threadIdx.x=1
threadIdx.x=2
threadIdx.x=3
blockIdx.x=3
threadIdx.x=0
threadIdx.x=1
threadIdx.x=2
threadIdx.x=3
To get the unique index of threads, we have to make use of blockIdx.x variables.
Just like the indices of threads can be accessed within a block,
- threadIdx.x
- threadIDx.y
- threadIDx.z
we have, indices for blocks within a grid give by these variables.
- blockIdx.x
- blockIDx.y
- blockIDx.z
To summarise: we have these following variables(size for grid and blocks are given by variables named as gridDim
and blockDim
, and index of blocks and threads within grind and blocks are given by blockIdx.x
and threadIdx.x
).
-
dim3 gridDim;
-
int gridDim.x;
= number of blocks in x -
int gridDim.y;
= number of blocks in y -
int gridDim.z;
= number of blocks in z
-
-
dim3 blockDim;
-
int blockDim.x;
: number of threads in x -
int blockDim.y;
: number of threads in y -
int blockDim.z;
: number of threads in z
-
-
dim3 blockIDx;
(for index of blocks within a grid)-
int blockIdx.x;
: index of block in x direction -
int blockIDx.y;
: index of block in y direction -
int blockIDx.z;
: index of block in z direction
-
-
dim3 threadIDx;
(for index of threads within a block)-
int threadIdx.x;
: index of threads in x direction -
int threadIDx.y;
: index of threads in y direction -
int threadIDx.z;
: index of threads in z direction
-
These values are set as configuration parameter before the launch of the kernel.
Let us say, we have two blocks with 4 threads each. To get the unique index of all threads we should use :
index = threadIdx.x + blockIdx.x * blokDim.x;
where blockDim.x
tells the size of the blocks.
threadIdx.x | blockIdx.x . blockDim.x | index |
---|---|---|
0 | 0.4 | 0 |
1 | 0.4 | 1 |
2 | 0.4 | 2 |
3 | 0.4 | 3 |
0 | 1.4 | 4 |
1 | 1.4 | 5 |
2 | 1.4 | 6 |
3 | 1.4 | 7 |
#include<iostream> #include <stdio.h> __global__ void threadID() { int threadindex = threadIdx.x; printf("Executing thread with index =%d withing the block \n", threadindex); } __global__ void blockID() { int blockindex = blockIdx.x; printf("Executing thread within the block ID %d \n",blockindex ); } __global__ void unique_index() { int index = threadIdx.x+blockIdx.x*blockDim.x; printf("Executing thread with index %d \n", index); } int main() { threadID<<<3,4>>>(); cudaDeviceSynchronize(); std::cout<<std::endl; blockID<<<3,4>>>(); cudaDeviceSynchronize(); std::cout<<std::endl; unique_index<<<3,4>>>(); cudaDeviceSynchronize(); return 0; }
Executing thread with index =0 withing the block
Executing thread with index =1 withing the block
Executing thread with index =2 withing the block
Executing thread with index =3 withing the block
Executing thread with index =0 withing the block
Executing thread with index =1 withing the block
Executing thread with index =2 withing the block
Executing thread with index =3 withing the block
Executing thread with index =0 withing the block
Executing thread with index =1 withing the block
Executing thread with index =2 withing the block
Executing thread with index =3 withing the block
Executing thread within the block ID 0
Executing thread within the block ID 0
Executing thread within the block ID 0
Executing thread within the block ID 0
Executing thread within the block ID 1
Executing thread within the block ID 1
Executing thread within the block ID 1
Executing thread within the block ID 1
Executing thread within the block ID 2
Executing thread within the block ID 2
Executing thread within the block ID 2
Executing thread within the block ID 2
Executing thread with index 4
Executing thread with index 5
Executing thread with index 6
Executing thread with index 7
Executing thread with index 0
Executing thread with index 1
Executing thread with index 2
Executing thread with index 3
Executing thread with index 8
Executing thread with index 9
Executing thread with index 10
Executing thread with index 11
You might wonder why is that the threads are not executing simultaneously?